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Plant disease classification using deep learning techniques is a popular 
research area due to the numerous opportunities for introducing advance and 
robust classifiers. Nevertheless, classifying chilli plant diseases accurately 
from images under uncontrolled environment and various imaging 
conditions remains unsolved due to the lack of chilli disease image datasets. 


In this study, the efficacy of three high-performance deep learning 


algorithms, namely VGG16, InceptionV3, and EfficientNetBO, in classifying 
Keywords: three types of chilli leaves diseases, namely upward curling, 
mosaic/mottling, and the bacterial spot, is demonstrated. These methods are 
popularly used for other plant disease classifications due to their 
effectiveness. The experiments were performed on the 3,000 chilli plant 
disease images collected from three different field environments in Selangor, 
Malaysia. The images were captured with a complex background and 
various illuminations, angles, and distances to reflect the real-life scenarios. 
The complexity of the collected images was created based on the taxonomic 
information of chilli leaves diseases and the unavailability of chilli disease 
images under various imaging conditions in the publicly available plant 
disease databases. Experimented using appropriate specifications, the 
models demonstrated outstanding performance with more than 95% 
accuracy with the highest accuracy of 98.83% by InceptionV3. 
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1. INTRODUCTION 

Chilli or scientifically known as capsicum annuum L. is Malaysia’s utmost cultivated crop [1] and 
has been recognized as among the top tenth of self-sufficiency ratio (SSR) in selected agricultural 
commodities with the highest import dependency ratio (IDR) of 73.6% [2]. The five widely domesticated 
species planted as annual crops are capsicum annuum, capsicum frutescense, capsicum chinense, capsicum 
baccatum, and capsicum pubescence [3], plagues and diseases easily infect these plants. The effects of the 
infection are a significant reduction in chilli production and deterioration of fruit quality, resulting in low 
returns for farmers [4]. According to [5], chilli plant diseases are mainly due to the infection caused by 
pathogenic microbes, namely fungus, bacteria, and viruses. The infection is visible but needs to be examined 
closely and adequately controlled to avoid the massive diseases spread on the farm. The conventional way of 
detecting and classifying plant diseases is time-consuming, and automatic detection and classification 
approaches have been introduced to tackle this problem. In the late 1990s, conventional computer vision 
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techniques were used to resolve chilli plant diseases’ automatic detection and identification [6]. The major 
weakness of the traditional computer vision technique was that it was only proven successful on simpler and 
controlled setups but struggled as the operational conditionschanged [7]. 

As the years passed, automatic detection and classification of plant diseases utilizing image 
processing and deep learning approaches have received significant consideration among the experts of the 
subject. Deep learning is a branch of artificial intelligence that allows machines to perform impressive 
recognition, prediction, and filtration [8]. Many practical and reliable deep learning algorithms for plant 
disease classification [9]—[13]. Typically, the classification is performed according to the infected leaf shape 
and a detectable change in the leaf colour caused by the fungus, bacteria, and virus infection. The application 
of the transfer learning approach for deep learning has received significant attention. Transfer learning has 
emerged as a powerful technique whereby the knowledge gained from the larger dataset is transferred to the 
new dataset [14], [15]. In scenarios within sufficient training data, this technique is beneficial, as presented in 
research by [16]. In transfer learning, pre-trained models are generally trained on a large scale, such as 
ImageNet that contains millions of actual images. The advantage is that the learned features are transferred 
by the weights and the architecture obtained from these models [17]. Inspired by these findings, the 
performance of the pre-trained model of VGG16, InceptionV3, and EfficientNetBO in classifying chilli plant 
disease images captured under an uncontrolled environment with various imaging conditions and a small 
dataset is studied. This paper shows the performance of these models for classifying highly complex chilli 
plant diseases images. The findings in this paper will create more opportunities for developing more accurate 
classifiers in the future. This is because the existing studies have only shown less than 90% accuracy on a 
particular type of chilli disease [18], [19]. This paper is organized as follows. Section 2 describes chilli plant 
disease taxonomy, and section 3 provides the architectures of the used deep learning methods, materials, 
methods, and experimental setup, and section 4 discusses the results. Finally, the paper is concluded in 
section 5. 


2. TAXONOMY OF CHILI DISEASES 

Chilli is a type of plant that can be easily affected by fungi, bacteria, viruses, and pests. Besides, 
climate changes and the risk of a resistance breakdown can also affect the durability of disease resistance. 
The example of the fungi, bacteria, viruses, and pests commonly affected by chilli plants [5] are summarized 
in Figure 1. 
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Figure 1. Taxonomy of chilli plant diseases according to [5] 


In this study, three types of diseasewere considered: the bacterial spot, upward curling and 
mosaic/mottling, as shown in Figure 2(a), Figure 2(b), and Figure 2(c), respectively. The bacterial spot is the 
small black spots of water-soaked on the leaves and gradually browning, coalesce, rugged and cracked. It is 
mainly due to the pathogen that is known as xanthomonas. The upward curling disease is caused by 
Begomovirus transmitted by Bemisia whiteflies that caused yellowing of veins and reduced leaf size. The 
mosaic disease caused the leaves to be yellowed and narrow, which is transmitted mainly by greenfly aphids. 
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Figure 2. Samples of chili plant diseases image used in the experiments, (a) bacterial spot, (b) upward 
curling, and (c) mosaic/mottling 


3. MATERIAL AND METHODS 
3.1. Chili plant disease dataset 

The dataset used in this study consists of 3,000 images of capsicum annuum L. plants and annotated 
into three classes of chilli leaves diseases: namely the upward curling, mosaic/mottling and bacterial spot. 
The images were self-collected under an uncontrolled environment and various illuminations, views, and 
distances to reflect the real-life scenarios. The images were collected from three different field environments 
located at Sijangkang Selangor; Community Urban Farm, Bukit Rimau, Selangor, and a greenhouse at the 
Faculty of Engineering, Universiti Putra Malaysia (UPM) Selangor. The images were captured using Apple 
iPhone 7 with the dimension of resolution 3024x4032 and Asus Zenfone 2 with the dimension of resolution 
2304x4096. These images were cropped, resized and flipped manually using Microsoft Photos at the initial 
stage to reduce the background clutter and occlusion issues. Data annotation was done by consulting the 
experts at the farms and cross-checking with the related published papers. For each disease, 800 and 200 
images were used for training and testing, respectively. 


3.2. Pretrained DCNN model and parameters 

In this study, the performance of VGG16, Inception V3 and EfficientNet BO in classifying chilli 
plant diseases from complex images was compared. These models were selected for their outstanding 
performance when classified the plant disease images from the ImageNet dataset [20]. The VGG16, as 
illustrated in Figure 3 [21], used a recommended default input image size of 224x224x3 and 13 
convolutional layers with a rectified linear unit (ReLU) activation function. The convolutional layers were 
fed into a max pooling, three fully connected (FC) layers and a Softmax function at the end of the 
architecture. The last FC layers were replaced by three channels for this study, indicating the three classes of 
chilli plant diseases understudy. 

Meanwhile, InceptionV3 [22] has 42 total deep network layers with a grid size-reduction block 
between the inception modules blocks and one auxiliary classifier at the third concatenated trunk, as shown 
in Figure 4. The recommended size of the input image for this model is 299x299x3, and five convolutional 
layers process the input image with two max-pooling layers at the first stage. Then, a series of inception 
modules process the input before finally performing classification using a fully connected layer and a 
Softmax function. EfficientNet [20] is a convolutional neural network architecture with a compound scaling 
method that uniformly scales all depth, width, and resolution dimensions to pursue better accuracy and 
efficiency. Currently, there are seven versions of EfficientNet networks, in which each version is the 
upgraded version of the previous, which is scaled from the EfficientNetBO baseline using different compound 
coefficients. The EfficientNetBO network consists of a convolutional layer, several mobile inverted 
bottleneck convolutional (MBConv) layersand optimization layers, as shown in Figure 5. The recommended 
size of the input image for this model is 224x224x3. 
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Figure 3. The architecture of the VGG16 network [21] 
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Figure 5. The architecture of EfficientNetBO network [23] 


3.3. Experimental setup 

The experiment was conducted on a 64-bit operating system, an x64-based processor running on 
Intel(R) Core (TM) i5-10200H CPU @2.40 GHz with NVIDIA GeForce GTX 1650 and 8 GB RAM. All 
deep learning models were compiled with GPU support. The proposed chilli plant disease classification is 
shown in Figure 6. The filters, feature maps, pooling layers and hyperparameters of the VGG16, Inception V3 
and EfficientNetBO models remain the same structure, as obtained from Keras Applications API with 
ImageNet [24]. Nevertheless, a combination of fully connected layers and Softmax activation was applied. 
This part has been converted into three outputs representing the chilli plant disease classes (upward curling, 
bacterial spot and mosaic/mottling). All pre-trained models were set to frozen layers to avoid Keras from 
updating their weights during the training. Other settings include batch normalization and batch size. Based 
on [25], each pixel value of the images was divided by 255 for batch normalization, and the selected batch 
size was 32. Batch normalization could overcome the problem of internal covariate shift, which can impede 
the training of deep neural networks. Stochastic gradient descent (SGD) was used as the optimizer due to its 
high performance [26], while the learning rate of 0.0001 was adopted based on [16]. The epoch is set to 50, 
and the selectionis based on several trials, such as 10, 30, 50 and 100 epochs. The resultshave shown that 50 
epochs have produced high accuracy and better processing stability. 
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The input images were divided into two sets, 80% for training and 20% for testing, as recommended 
by [27]. The images were resized according to the model’s default size, 224x224 pixels for VGG16 and 
EfficientNetBO, and 299x299 pixels for Inception V3. The methods were trained with two training sets, 
where the first set consists of original images and the second setconsists of augmented images. Both sets 
consist of the same amount of images that is 2400 images. In the second training set, the images were sheared 
at an angle of 0.2 degrees, zoomed at 0.2 magnification and horizontal flipped using Imagedatagenerator in 
Kerasapplication. Image data generator works randomly in real-time, with the number of images remaining 
the same. The augmentation parameters selection was decided based on the observation from a few trials, 
where the features of the disease can be visualized using these parameters. 


Image Input CNN Pretrained Model Classified 
Preparation Images (VGG16, InceptionV3 and EfficientNetBO) Images 


CONVOLUTION LAYER POOLING LAYER FLATTEN FULLY 
CONNECTED 
LAYER 


Figure 6. The proposed chili plant disease classification framework 


4. RESULTS AND DISCUSSION 

The performance of the selected deep learning algorithms was evaluated based on accuracy, recall, 
precision, and Fl-score. Accuracy is the number of correctly identified samples, and recall is the number of 
positive samples that are accurately identified. Meanwhile, precision is the measurement of accurately 
identified samples among all the true samples, andthe Fl-scorerepresents a harmonic mean between 
sensitivity and precision. The experiments were conducted on two datasets, where the first dataset consist of 
original images and augmented images in the second dataset. The results in Figure 7(a), Figure 7(b) and 
Figure 7(c) show that EfficientNetBO outperformed VGG16 and InceptionV3, but in Figure 8(a), Figure 8(b) 
and Figure 8(c), it is shown that InceptionV3 outperformed VGG16 and EfficientNetBO. It is also observed 
that for both cases, VGG16 and InceptionV3 required less computation time compared to EfficientNetBO to 
reach above 90%. 
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Figure 7. The accuracy produced by (a) VGG16, (b) Inception V3, and (c) EfficientNetBO using original 
images for training 
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Figure 8. The accuracy produced by (a) VGG16, (b) InceptionV3, and (c) EfficientNetBO using augmented 
images for training 


Nevertheless, the accuracy of models trained by original images did not differ much when trained on 
augmented images. At the 50 epochs, it is shown that the InceptionV3 has produced the highest accuracy of 
98.83% on augmented images while EfficientNetBO highest accuracy of 97.67% on original images, 
respectively. The obtained accuracy, precision, recall, and Fl-score for the VGG16, InceptionV3 and 
EfficientNetBO are shown in Table 1. It is shown that the performance of EfficientNetBO and InceptionV3 is 
approximately similar for both cases, that is, when original or augmented images train the models. However, 
EfficientNetBO has better performance when trained by original images, but opposite to InceptionV3, the 
model performed better when trained by augmented images. 

The summary of the classification results for each disease, namely upward curling (UC), 
mosaic/mottling (M) and bacterial spot (BS) diseases, are shown in the confusion matrix in Table 2. It is 
demonstrated that InceptionV3 has produced the highest true positivevalue when classifyingbacterial spots 
for both cases but the mosaic/mottling, the InceptionV3 model, has the highest true positive value when 
trained by augmented images. Nevertheless, the true positive value of the InceptionV3 model is lower than 
the true positive value of EfficientNetBO when trained by original images. All three methods have equivalent 
performance when classifying the upward curling disease. It is also shown that the models have difficulty 
classifying the mosaic/mottling disease, as the produced true positive value is the lowest compared to other 
diseases. The complex features of the mosaic/mottling disease are approximately similar to the upward 


curling disease that caused the methods to misclassify. 


Table 1. The VGG16, InceptionV3 and EfficientNetBO testing performance 


Training Set Models Accuracy (%) Precision (%) Recall (%) F1 Score 
First set (original images) VGG16 95.17 95.00 95.00 0.95 
InceptionV3 97.50 97.00 97.00 0.97 
EfficientNetB0 97.67 98.00 98.00 0.98 
Second set (augmented images) VGG16 95.83 96.00 96.00 0.96 
InceptionV3 98.83 99.00 99.00 0.99 
EfficientNetBO 96.83 97.00 97.00 0.97 


Table 2. Confusion Matrix of the methods that were trained using original and augmented images 
First Training Set (Original Images) Second Training Set (Augmented Images) 
VGG16 InceptionV3 EfficientNetBO  VGG16 Inception V3 EfficientNetBO 
True BS 0.97 0.02 0.01 0.99 0.0 0.1 0.98 0.0 0.02 0.98 0.02 00 10 00 00 0.98 0.0 0.02 
Label M 0.01 0.91 0.08 0.02 0.96 0.02 0.0 0.98 0.02 0.02 0.90 0.08 0.0 0.98 0.02 0.0 0.94 0.06 
uC 0.0 0.02 0.98 0.0 0.03 0.97 0.0 0.02 0.98 0.0 0.02 0.98 0.0 0.02 0.98 0.0 0.02 0.98 
BS M UC BS M UC BS M UC BS M UC BS M UC BS M UC 


Predicted Label 


5. CONCLUSION AND FUTURE WORKS 
The efficacy of deep learning algorithms, namely VGG16, Inception V3 and EfficientNet BO, 


methods in classifying three types of chilli plant diseases, namely upward curling, mosaic/mottling and 
bacterial spot from a dataset that consists of 3,000 images in an uncontrolled condition and under various 
imaging conditions is demonstrated. The experiment results showed that InceptionV3 has better performance 
than the VGG16 and EfficientNetBO in classifying bacterial spot images. Still, all the models have difficulty 
classifyingthe mosaic/mottling disease due to the complex features of the mosaic/mottling disease that are 
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approximately similar to the upward curling disease. All three methods have equivalent performance when 
classifying the upward curling disease. In conclusion, deep learning algorithms have shown a great potential 
for classifying chilli plant diseases. The performance algorithms can be further improved by exposing them 
to high-complexity images and several other types of diseases, which will create more opportunities for 
developing more advanced classifiers. 
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